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Abstract 


Multidimensional item response models can be based on multivariate normal ability distributions 
or on multivariate polytomous ability distributions. For the case of simple structure in which 
each item corresponds to a unique dimension of the ability vector, some applications of the 
two-parameter logistic model to empirical data are employed to illustrate how, at least for the 
example under study, comparable results can be achieved with either approach. Comparability 
involves quality of model fit as well as similarity in terms of parameter estimates and computational 
time required. In both cases, numerical work can be performed quite efficiently. In the case of the 
multivariate normal ability distribution, multivariate adaptive Gauss-Hermite quadrature can be 
employed to greatly reduce computational labor. In the case of a polytomous ability distribution, 
use of log-linear models permits efficient computations. 
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Multidimensional item response models are well-known in the psychometric literature but 
relatively little used in practice (Reckase, 2007). In this report, simple-structure multidimensional 
two-parameter logistic (2PL) models are considered in which each item is associated with one 
coordinate of the ability vector (Zhang, 2004). This restriction simplifies analysis to a considerable 
degree relative to approaches in which the relationship of items to coordinates of the ability vector 
is not specified. Two distinct models are considered for the distribution of the ability vector. 

In the first case, the ability vector is assumed to be a multivariate normal random vector with 
mean 0 and with a covariance matrix that has all diagonal elements equal to 1, so that each 
coordinate has variance 1, and with unknown off-diagonal elements that are the correlations of 
the coordinates of the ability vector. In the second case, the ability vector is assumed to have 
polytonrous coordinates, a choice that may be able to reduce the computational burden associated 
with multidimensional model, but one that may seem less familiar than assuming a normal 
distribution. In the polytonrous case, the realizations of each coordinate of the ability vector are 
from a discrete and finite set of real valued ability levels. Unidimensional models of this type are 
sometimes referred to as discrete latent trait models (Heinen, 1996) or located latent class models 
(Formann, 1992). In the case of multidimensional discrete latent traits, the term diagnostic models 
(von Davier, 2005) is employed. Each of the coordinate sets of ability levels may be different, and, 
for a given coordinate, it is common to use evenly spaced integers, so that the set of levels for 
a coordinate might be the two-member set {—1,+1} or the three-member set { —1,0,+1}. Sets 
used will often have four or more elements. In both models for the ability vector, algorithms are 
provided for computation of maximum likelihood. These algorithms are sufficiently efficient so 
that complete data from an assessment can be analyzed rapidly enough for practical use. 

By means of the expected log penalty criterion (Gilula & Haberman, 1994), the two cases 
are compared in terms of their effectiveness at describing the observed data. In addition, the two 
cases are compared in terms of reliability of ability parameter estimates provided by the models. 
Approaches used do not assume that any model examined is valid, and comparisons involve 
measurement of the quality of prediction of response patterns rather than test of goodness of fit. 

The basic conclusion suggested by the example studied is that the choice of latent-variable 
distribution has remarkably limited effect. This conclusion is consistent with a previous 
one-dinrensional analysis (Haberman, 2005a), although it is possible that other examples can be 
found in which larger differences between model performance are evident. 
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In section 1, the multivariate two-parameter logistic (2PL) model under study is introduced. 
Section 2 considers application to a multivariate normal ability distribution. Section 3 considers 
application to multivariate polytomous ability distributions. Section 4 illustrates application of 

TM 

results to a Praxis administration. Section 5 provides conclusions based on the empirical results 
observed. 


1 The Multidimensional 2PL Model 

In the general model under study, a test is considered with q > 2 right-scored items. A sample 
of n > 2 examinees is used in analysis of the data. For examinee i, 1 < i < n, for item j, 1 < j < q, 
Xij is 1 if the response to item j is correct, and X t] is 0 otherwise. The g-dimensional vectors Xj 
with coordinates X t j , 1 < j < q, are independent and identically distributed for examinees i from 
1 to n, and the set of possible values of Xj is denoted by T. 

The basic 2PL model under study assumes that an ?’-dimensional random ability vector di 
with coordinates Oik, 1 < k < r, is associated with each examinee i. The pairs (Xj, 6{), 1 < i < n, 
are independent and identically distributed, and, for each examinee i, the response variables X^, 
1 < j < q. are conditionally independent given 6, . Let 

P(h; y) = exp(hy)/[l + exp(y)] 


for h and y real. 

To each item j, 1 < j < q, corresponds an ability coordinate v{j), 1 < v(j) < r. For an 
unknown item discrimination a.j and an unknown real parameter 7 j, if u: is a d-dimensional vector 
with coordinates Uk, 1 < k < r, then the conditional probability that X l3 = h given 0 j = u is 
P{h',ajUJ v tj ) — 7 j). Provided that the discrimination a,j is positive, the item difficulty for item j is 
then 7 j/cij = bj. If r is 1, then one has a one-dinrensional 2PL model, for 


P(xij\aj<jj\ - 7 j) 


exp [a; jj 07(07 — bj)] 
1 + exp[oj(o;i — bj)] 


If, in addition, all cij are equal, then one has a one-dinrensional one-parameter logistic 
(1PL) model. This model may also be termed a Rasch model. If r > 1 , then the 2PL model is 
multidimensional. For r > 1, the assumption is made that, for 1 < k < r, the set v~ l (k) of items 
j , 1 < j < q, with v(j) = k is nonempty, so that each coordinate 0\k of 0, corresponds to at least 
one item. If cij is constant for j in u _ 1 (fc), 1 < k < r, then one has a multidimensional 1PL model. 
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In all cases under study, a restrictive model is used for the distribution of the ability vector 
0j. In section 2, is assumed to have a multivariate normal distribution with mean 0 and with a 
covariance matrix that has all diagonal elements equal to 1 , so that each 6 ^ is assumed to have 
variance 1. In section 3, the distribution of 0, is assumed to have all mass on a known finite set 
11 , which represents the possible values of a multidimensional discrete ability vector. 

2 The Multivariate Normal Case 

In the multivariate normal case, the assumption is made that 0, has a multivariate normal 
distribution JV(0,D). Here 0 is the r-dimensional vector with all coordinates 0, and D is an 
r-by-r positive-definite symmetric matrix with elements dkk ', 1 < k < r, 1 < k! < r, such that 
each diagonal element dkk = 1, and dkk ', k ^ k r , is the unknown correlation of Oik and Ow- The 
assumption that the mean of 6 t is 0 and the variance dkk of each 9ik is 1 is imposed to permit 
identification of the item parameters cij and bj for each item j from 1 to q. For comparison 
with the polytomous case presented in section 3, let d km be row k and column m of D 1 for 
1 < k < m < r, let |D| denote the determinant of D, and let 8km be 1 for k = m and 0 otherwise. 
Then the density pe of 0, at a vector u with coordinates uj k , 1 < k < r, satisfies 

logpeM = ^ [(1 - 8 km /2)d km }u k u; m . (1) 

k =1 m=1 

2.1 Model Parameters 

The multivariate normal case can be parametrized so that a version of the stabilized 
Newton-Raphson algorithm (Haberman, 1988) can be readily applied. The basic requirement 
involves an appropriate decomposition of D. If r is 1, then D reduces to the one-by-one matrix 
with element 1, and D = FF 1 , where F and its transpose F r equal D. By use of the Cholesky 
decomposition (Stewart, 1973, p. 134), it follows that, if r > 1, then D is determined by 
unique real constants T k k ', 1 < k! < k < r, by the decomposition D = F(r)[F(r)] / . Here r is 
an r(r — 1 )/ 2 -dimensional vector with element k! + k(k — l)/2 equal to T k k' for 1 < k 1 < k and 
1 < k < r, and F(r) is an r-by-r matrix with elements fkk’( T )i l<k<r,l<k' <r. The upper 
triangular elements fkk'{ T ) = 0 f° r 1 <k<k'<r. Let 

/ fc-i \ 1 / 2 

u k{r) = ( 1 + T kk> ) 

V k '=1 / 
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for 2 < k < r, and let z'i(r) be 1 . The diagonal element fkk { T ) = V z/ fc( T ) f° r each integer k, 

1 < k < r, the first column in the lower triangle of F(r) satisfies f k \{r) = "Tci/OcM for 1 < k < r, 
and the remaining lower triangle of F(r) satisfies 


r ( \ T kk' 

Jkk'\ r ) — / \ / \ 

M T ) u i fe'(r) 

for 1 < k' < k < r. The constants r kk > can have any combination of real values. If r = 2, then 
t 21 = c? 2 i/(l — d> 2 1 ) 1 / 2 , where c? 2 i is the correlation of 9i2 and On. The distribution of Oi under 
the model is the same as the distribution of F(r)Z, where Z is an r-dimensional random vector 
with independent coordinates Z k with standard normal distributions, 1 < k < r. Let 0 be the 
density function of the standard normal distribution, and let <f> r be the function on R r such that 
4> r (z) = nU f° r each r-dimensional real vector with coordinates z k , 1 < k < r. Thus <f> r is 
the density of Z. 

Consider the vector (3 with u = 2q + r(r — l)/2 coordinates f3j, 1 < j < u such that f3j = aj 
for 1 < j < q, (3 q+j = 7 j for 1 < j < q, and /3 2<? +fc'+(fc-i)(fc-2)/2 is r kk > if 1 < k' < k < r. Let r(/3) 
be the r(r — l)/2-dimensional vector with elements ^q+h f° r 1 < h < r(r — l)/2. Let R(/3) be the 
one-by-one identity matrix if r is 1. Otherwise, let R(/3) be F(r(/3)). 

For any (/-dimensional vector x with all coordinates 0 or 1, the probability that Xj = x is then 

p( x ; (3) = J p (x|R(/3)z; (3) <j> r (z)dz. 


For the ?’-dimensional vector u> with coordinates LV k , 1 < k < r, 

9 

p{x\u;f3) = P{xj,f3jU v (j) - (3 q+j ) 

3 =1 

is the conditional probability that Xj = x given that 9 j = u>. If, for 1 < k < r, 

9 

Sfe(x,/3) ^ (j) j 

3=1 


and if 


then 


9 

V(x,u;(3) = J] 


j=i 


exp(-(3 q+j Xj) 

1 + exp(/3 ? -o;„( J -) - (3 q+j ) ’ 


p(x|w;/3) 


r 


V(x, cu; (3) exp 


^s fc (x; /3)w fc 

_fc=i 


4 



The log likelihood function is then 

n 

i=i 

where 


£i((3) = log p(Xi](3), 1 <i<n. 


If 


K it {z) =p(Xj|R(/3)z;/3)</» r (z) 


for r-dimensional vectors z, then 

Zi(l 3) = log J K it (z)dz. 


2.2 The Stabilized Newton-Raphson Algorithm 

The likelihood function may be maximized by a simple variation on the stabilized 
Newton-Raphson algorithm (Haberman, 1974a, 1988). It is also possible to use the EM algorithm 
(Dempster, Laird, & Rubin, 1977); however, because the Hessian matrix of the log likelihood 
is not used in computations in this case, the EM algorithm is less helpful for estimation of 
asymptotic variances. The one major complication is the problem of r-dimensional quadrature. 
Adaptive Gauss-Hermite integration is appropriate for this problem (Haberman, 2006), although 
the multidimensional version of adaptive integration is a bit more complex than is the univariate 
version. Consider use of s(k) quadrature points for dimension k, 1 < k < r. Let Vkh and ykh , 

1 < h < s(k), be defined so that 

*(fc) r 

/ y m <f>(y)dy 

e=l d 

for 1 < rri < 2 s(k) — 1. Let (3 denote the maximum-likelihood estimate of f3, so that £((3) is the 
supremum 7* of £(j3) for all possible i/-dimensional vectors (3. Consider an iteration t > 0 of 
the stabilized Newton-Raphson algorithm. Let H be the set of all r-dimensional vectors h with 
coordinates h(k), 1 < h(k) < s(k), 1 < k < r. Thus H has ni=i s (^) elements. Let yh be the 
vector with coordinates yh(k) f° r 1 < A: < r. Then 

/ vr(z)0 r (z)dz = E ^(yh) II Vkh(k) 

' h eH k =1 

whenever 7r(z) is a polynomial such that no power of a coordinate Zk exceeds 2 s(k) — 1. 
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To apply adaptive quadrature, consider an iteration t > 0. At the start of the iteration, let 
(3 t be an approximation for the maximum-likelihood estimate (3 of (3 . The standard formula in 
calculus for change of variables permits £i(/3) to be approximated by a function 

M/3) = log L it (P), 

where 

r 

Util 3) = IWfll" 1 ^2 [M(uith)/</» r (y h )] n Vkh(k )> 

h GH k= 1 

Wth = (W' t )~Vh + Zj, 

|Wjt| is the determinant of W^, is an r-by-r matrix with coordinates wttkk'-, 1 < k < r, 

1 < y < r, wnkk is positive for 1 < k < r, wukk’ = 0 for 1 < k < k' < r, zu is an approximation to 
the location of the maximum over z in R r of Ku( z), and W,(Wj ( = —V 2 Rjt( z *i) for the Hessian 
matrix V 2 /ijt(zjf) of Ka at z.; t . Note that |W^| is the product of the wukk for 1 < k < r (Rao, 
1973, p. 23). 

With the starting value / 3 t , one step of the stabilized Newton-Raphson algorithm is applied 
to 1st = Yh =1 hi to yield a new approximation, 

Pt.+l = Pt + a iCt- 

To define cq and let s and n* < 1/2 be given positive constants, let |z| be maxi <_,■<„ \zj\ for a 
i/-dimensional vector z with coordinates Zj, 1 < j < v, let V£st. be the gradient of 1st, let V 2 £st 
be the Hessian matrix of It, let I be the v-by-v identity matrix, let 

A t = -V 2 £t(P t ) + ctl 

be positive definite, let 

Ct = A 

let |Ctl < k, and let at > 0 satisfy 

tst(0t+ 1 ) - M(/3 1 ) > a t K*C' t V£st(Pt)- (2) 

Here ct is 0 if this choice satisfies the conditions that At is positive definite and |£ t | < k. 
Otherwise, ct is obtained by letting c| be the maximum absolute value of a diagonal element of 
V 2 £t(/3t) and successively trying n*Ct, (1 + 2 2 )k*Cj, (1 + 2 2 + 3 2 )k*c|, and so on. If (2) is satisfied 
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with at = 1, then at is set to 1. In general, at is found by use of a rough approximation to 
the maximum of £st(Pt + a Ct) for a > 0 (Haberman, 1974a, 2006). The choices of n = 2 and 
k* = 1/16 are used in calculations reported in this report. 

For the example studied in this paper, use of Sk = 4 for each k was quite adequate for a 
case with r = 4, q = 118, and 29 or 30 items associated with each coordinate 9 The choice 
of Sk = 3 for each coordinate k was also acceptable, and even Sk = 2 for each coordinate k was 
tolerable. These relatively small values are important, for s*. = 4 for each k and r equals 4 
leads to 256 quadrature points, while Sk = 3 for each coordinate leads to 81 quadrature points, 
and Sk = 2 for each coordinate leads to 16 quadrature points. The relatively small number of 
points required is consistent with existing literature (Schilling & Bock, 2005). The quadrature 
situation with adaptive quadrature is far better than with the nonadaptive quadrature approach 
used in the National Assessment of Educational Progress (NAEP). This approach, found in the 
NAEP BGROUP program, uses 41 points for each coordinate (Sinharay & von Davier, 2005), 
so that, for r = 4, 41 4 = 2,825,761 quadrature points would result. In practice, for more than 
two dimensions, NAEP uses the CGROUP program. This program employs a generalization of 
Laplace approximations for integral evaluation, so that the actual computational labor is much 
less than suggested by this comparison. Nonetheless, accuracy of the Laplace approach is an issue. 

2.3 Estimated Expected Log Penalty 

To evaluate the model, consider the expected log penalty 

H( z) = -q~ 1 E(£ i ( z)) 

per item (Gilula & Haberman, 1994). Consider the minimum I of H(z) for zz-dimensional vectors 
z such that Zj > 0 for 1 < j < q. Let H((3) = I. If the 2PL model with multivariate normal ability 
vector is correct, then f3 is defined as in the model definition and I is the entropy per item of the 
vector X,;. If /3 is uniquely defined, then (3 converges to (3 with probability 1 as the sample size n 
goes to oo, whether or not the model holds, and I = (ng) -1 ^* converges to I. Let 

Z = E(-V 2 £ 8 (/3)), 
let 

Y = E(V£ i (/3)[V£ i ( / 3)] / ), 
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and let tr denote a trace. Then n 1/,2 (/3 — (3) converges in distribution to a normal random vector 
with mean 0 and covariance matrix Z'^ 1 YZ“ 1 . If the model holds, then Y = Z and the covariance 
matrix is Y _1 . The scaled difference n 1/,2 (I — I) converges in distribution to a normal random 
variable with mean 0 and variance equal to the variance of q~ 1 £i(/3). The expected value of I is 
less than I. As n approaches oo, 2 nq[I — E(I)] converges to -0 = tr(Z -1 Y). In addition, if Xo is 
independent of X.; for 1 < i < n and Xo has the same distribution as X,, then the conditional 
expectation Iq of the log penalty —q~ 1 l q(/3) given X*, 1 < i < n, for prediction of Xo satisfies the 
condition that nq(Io — I) converges in distribution to a random variable with expectation i/j, and 
ip is v if the model holds. More generally, ip is estimated by 'ip = tr(Z _1 Y), where 

Z = -n~ l V 2 t 0) 

and 

n 

Y = n- 1 ^V£ i 0)[Ve i m’. 

1=1 

Thus Iq may be approximated by Iq = I + i/j/(nq). In practice, is approximated by use of 
adaptive Gaussian quadrature. A simplified approximation that assumes the model is correct is 
Iao = I + v/{nq). This approximation is the Akaike information criterion (AIC) divided by the 
number of items times twice the sample size (Akaike, 1974). 

The AIC, which is used widely in model selection, balances the gain in log likelihood of 
a model (the improved model fit) against the cost in terms of parameters being estimated. 
Therefore, a model that fits the data better but needs a much larger number of parameters than 
competing models with just slightly lower estimated expected log penalty may not fare as well 
when evaluated by means of the AIC. The Gilula-Haberman criterion Iq generally leads to results 
similar to those obtained with the AIC criterion, although appreciable differences can arise when 
the model fits the data rather poorly. When sample sizes are large, I, Iq, and I a o are normally 
very similar (Gilula & Haberman, 2001). This situation is helpful when the EM algorithm is 
employed, for estimation of Jo is less readily accomplished in this case than in the case of the 
stabilized Newton-Raphson algorithm. 

2.4 Estimated Ability Parameters 

The ability parameter 0 L can be defined and approximated even if the underlying model is 
not accurate (?, ?). Let 0, be defined as a random vector such that the conditional distribution 



of 6 l given X; = x is the same as the conditional distribution of a random vector 9* given the 
random vector X* with values in T, where 9* has a multivariate normal distribution with mean 
0 and covariance matrix R(/3)[R(/3)] / and the conditional probability that X* = x in T given 
9* = ijo is p(x|w; (3). Thus the conditional density p(u;|x; (3) at u of 0j given Xj = x is given by 
Bayes’s theorem to be 


Pe |xMx;/3) = 


p{x\u;-(3)4> r ([R((3)} 1 uj 


|R(/3) b( x ; (3) 

If the model actually holds, then the definition of 9 t in the model definition is consistent with 
the definition applied here. The information per item provided by 9 t is 


A = / - I e . 


Alternatively, qA is the information that X.; provides concerning 9,. Here Iq is the expected value 
per item of the log penalty — logp(Xj|0j; (3) from use of the conditional probability approximation 
p(X,;|0.j) for the conditional probability given 9, for the observed value of X;. One has 

<? 

!e = 

3 =1 


where 


he = -^(log P(Xif,/3jO v (j) - (3 q+ j)). 


An application of Bayes’s theorem shows that Iq may be estimated by 

n 

I e = -( nq)~ l E[p( x i;/3)] _1 
i —1 


J 4 vMp(Xj; R(^)w; (3) logp(X ?: ; R(^)w; f3)du. 


It follows that A has estimate A = I — Iq. 

The conditional expectation 9 t = E(9i\Xi) of 9 b given Xj, the EAP estimate of 9 t (Bock & 
Aitkin, 1981), is found from Bayes’s theorem to be /3), where 


-^e|x( x ;/3) — 



w|x; (3)du. 


Although the expectation E(9i) = E(9.j) of 9, is the zero vector 0 under the model, the 
expectation need not be 0 if the model does not hold. The estimated conditional expectation of 
9 t given X ( is 9i = £^e|x(Xi; $)■ Computations may be performed by use of adaptive quadrature. 
One may employ 0* as an estimate of 9 t . The expectation E(9{) is then estimated by the average 

0 = n- 1 Z?=idi- 
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The covariance matrix of Oi is 


Cov(0) = E([0i - E(0i)}[0i - E{Oi)]'). 


The corresponding estimate is 

n 

Cw(0) = n -1 ^[di - 0]% - 6\'. 

i= 1 

The conditional covariance matrix Covj(0|X) of 9% given X* may be used to assess the 
accuracy with which the data X, determine 9,. One has Covj(0|X) = Covg| X (X,;; j3), where 

Covg| X (x; 0) = j[u - Eg | X (x;/3)][w - E e]x (x; f3)]'p e \^(uj\x; f3)duj. 

The estimate of Cov*(0|X) is then Covj(0|X) = Covg|x(Xj; /3). The expected conditional 
covariance matrix F7(Covj(0|X)) is then estimated by 

n 

Cov(<?|X) = n _ 1 ^Covi(fl|X). 

i =1 

For any nonzero r-dimensional vector c, the reliability of c’Oi is 

2 , . c'Cov(0)c 

p { c) = -—-~—• 

c , J E(Cov i (0|X))c + c' Cov(0)c 

The reliability of E9 L is approximately the same in large samples, and the estimated reliability is 
then 

\ c'Cov(0)c 

p (c) = - - - —• 

c'Cov(0|X)c + c'Cov(0)c 


3 The Polytomous Case 

In the polytomous case, the assumption is made that the distribution of 0* is confined to 
a finite set II with M elements. Often, the set of multidimensional ability levels il will be a 
nonempty subset of the Cartesian product ni=i ^k °f sets ilk, 1 < k < r, where ilk is a subset of 
the real line that contains Ck > 1 possible values of 60;. In typical cases, ilk is the set of integers 
from — (cfc — l )/2 to (c& — l )/2 if Ck is odd, and ilk is the set of integers — Ck — 1 + 2 d for integers 
d from 1 to Ck if q, is even. Thus ilk is {—1,1} for c*, = 2 and {—1,0,1} for q, = 3. Computations 
are most rapid if the number of elements of id is small. Thus permitting il to have fewer than 
the nU Ck elements of YYk=\ can save computational labor. Of course, such a saving is only 
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appropriate if the ability of the model to predict the joint distribution of the X* is not impaired 
to a substantial degree. 

For each u> in 12, the probability Pde{u) that Oi = u> is assumed positive, and it is assumed 
that the Pdeip 1 ) satisfy a log-linear model 

G 

logPdeM = A + T 0 (u>) + 

9= 1 

for known constants T g { cj),0 < g < G < M, and unknown parameters A and Td g , 1 < g < G. 
Given the T g (u>) and the Td g , A is determined by the requirement that the sum of the Pde(u), u) in 
12, must be 1. To provide any possibility that the r^ 9 , 1 < g < G, can be identified, it is assumed 
that no real constants u g , 1 < g < G, exist such that some u g is not zero and Y^=i u g'^'g( u} ) 
has the same value for all lo in 12. Even with these constraints on G and on the T g (u), the Td g , 

1 < 5 G, cannot be identified unless 2q + G is less than 2 q — 1 (Haberman, 2005a), and, in 
practice, identification of parameters is much more difficult unless G and the T g (u>) , 0 <g <G,gl> 
in 12 , are carefully selected. 

The basic log-linear model to consider is analogous to the multivariate normal distribution 
applied in the continuous case. One considers a log-linear model with no main effects and with 
only linear-by-linear interactions, so that, for iok the arithmetic mean of the elements of 12 *,, 

r k 

log Pde{u) = A + ^2 X] Vkmivk - Wfe)(w m - U>fc)- (3) 

k =1 m= 1 

With no restrictions imposed on the % m , this model has G = r(r + l)/2 independent parameters. 
Comparison of (1) and (3) shows that log pg and log pdo have a very similar form, especially in the 
typical case in which Uk = 0 . 

More general use of polynomials can be considered. For 1 < k < r, let Okh , 0 < h < Ck, be the 
orthogonal polynomial of degree h that corresponds to the elements of 12 k and to some positive 
weighting function Wk on 12 k, so that 

^ ^ ^k{,^k)Okh{p^k)Okrn{^k) = &hm 

for 0 < h < m < Ck■ Let S be a nonempty set of vectors £ with integer elements £(fc), 

0 < £(k) < Ck, for 1 < k < r. Assume that no vector in H has all coordinates 0. Then H defines a 
log-linear model 

r 

log Pdo(u) = A + (4) 

k =l 
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Models of this kind have a long history in the literature on log-linear models (Haberman, 1974b) 
and variations have begun to appear with general diagnostic models (Xu & von Davier, 2007). 
The model specified by (4) is equivalent to the model specified by (3) if H consists of all vectors £ 
with either two coordinates equal to 1 and all other coordinates 0 or with one coordinate equal to 
2 and all other coordinates 0. General diagnostic models have applied (4) with H consisting of all 
vectors £ that correspond to the model of (3) together with all additional vectors £, which have 
all coordinates but one equal to 0 and one nonzero coordinate with a value between two specified 
positive integers. 


3.1 Model Parameters 

As in the multivariate normal case, the polytomous case can be parametrized so that a 
version of the stabilized Newton-Raphson algorithm (Haberman, 1988) can be readily applied. 
Alternatively, polytomous discrete cases can be specified as multidimensional discrete latent trait 
models and estimated with the EM algorithm, for example using the software mdltm (von Davier, 
2005). 

For the log likelihood to be maximized, consider the vector (3 d with u d = 2q + G coordinates 


Pdj, 1 <j<u such that f3 dj = aj for 1 < j < q. P d (q+j ) = 7 j for 1 < j < q, and P d ( 2 q+ g ) is r dg for 
1 < g < G. Let 


x{P d ) = exp 


G 


To(u>) + Pd(2q+g)Tg(' 
9 =1 




and let 


Pde(u; (3 d ) = \x(P d )] 1 ex P 


G 


TbM + Pd(2q+g)T g {u) 

9=1 

For any (/-dimensional vector x with all coordinates 0 or 1, the probability that X.; = x is then 


Pd{*;Pd) = ^2 Pd(*\u; Pd)Pde{u-, p d )- 


For the ?’-dimensional vector u with coordinates ujk, 1 < k < r, 

<? 

Pd{x\LO;Pd) = P{Xj,P dj U v y) - P d {q+j)) 

3 =1 

is the conditional probability that X* = x given that 6 1 = u. If, for 1 < k < r, 

9 

Sdfc( x i Pd) ^ Pdj-Ej ; 

3 =1 
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and if 


T/ / Y , R ^ _ TT exp (-Pg+jXj) 

^5 /-'d) 1 1 1 i / n a \ ' 

j_l ^ ( Hdj(j ) Pd(q-\-j)) 

then 

r 

p d (x|cj; /3 d ) = Vd(x, w; /3 d ) exp ^ s dfc (x; /3 d )w fc 

_fc=i 

The log likelihood is then 

n 

Wd) = 

i=l 

where 

tdiiPd) = log p d (Xi;f3 d ), 1 <i<n. 

For the maximum-likelihood estimate 0 d , Idifld) I s the supremum t dif of PdiPd) f° r all 
z^-dinrensional vectors /3 d . 

Unlike in the multivariate normal case, considerable care is needed in the polytomous case to 
understand when models really differ. For example, consider a positive constant Zk and a real 
constant Uk for 1 < k < r. Replace each u>k in fl*. by ZkCOk + Uk , divide each item discrimination 
(ij by Zk if v(j) = k, change each intercept parameter 7 j to 7 j — uyi 3 /zy- if v(j) = k. and let (3) 
continue to hold for u in fl with each r]km divided by ZkZ m . Then the probabilities p^(x; j3 d ) are 
unchanged for x in T. It follows that the selection of to consist of evenly spaced integers with 
mean 0 is equivalent in terms of the resulting model to any selection of f Ik in which the members 
of Qk are evenly spaced points. Thus Qk = {—1, 0,1} leads to the same model as Qk = {1,1.5,2}. 

In addition, the connection with the multivariate normal case is stronger than might at first 
be apparent. Define the covariance matrix D and the elements d km of D 1 as in the multivariate 
normal case. If rjkm = (1 — 6km/^)d km , consists of numbers (—Ck — 1/2 + 2 d)zk, 1 < d < Ck, 
where Zk > 0 , Zk approaches 0 and CkZk approaches 00 , then 0 ; converges in distribution to a 
multivariate normal random vector with mean 0 and with covariance matrix D. The argument 
required involves use of an auxiliary r-dimensional random vector u, which is independent of 
0j and has independent coordinates Uk with uniform distributions on (— Zk/2, Zk/2)- It is a 
straightforward matter to show that 0 .; + u is a continuous random vector with a density that 
approaches the multivariate normal density pe defined in ( 1 ). Application of Scheffe’s theorem and 
the Mann-Wald theorem yield the desired result (Rao, 1973, pp. 122-125). Given the previous 
observations concerning the effects of linear transformations of the elements of D/ c for 1 < k < r, 
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the practical consequence of the result is that, for any e > 0, there exists an integer c > 0 such 
that id* > i* ~ e whenever each Ck > c and 17 = ]/[fc=i ^k- Thus polytomous models must be 
competitive with multivariate normal models in terms of model fit for sufficiently large Ck■ As 
evident from the data analysis, polytomous models are attractive even for all Ck equal to 4 or 5, 
and it is possible to use 17 with somewhat fewer elements than ni—i ^k with little loss. 

3.2 The Stabilized Newton-Raphson Algorithm 

The log likelihood may be maximized by a simple variation on the stabilized Newton-Raphson 
algorithm (Haberman, 1974a, 1988). The EM algorithm can also be employed (von Davier, 2005; 
Xu & von Davier, 2007). In the polytomous case, no integrals are evaluated, so that adaptive 
Gauss-Hermite quadrature is not required and calculations are simpler. Nonetheless, some 
restrictions on the size of G are required to ensure that the model parameters are well-enough 
identified to permit a reasonable rate of convergence (Haberman, 2005a). Consider an iteration 
t > 0. At the start of the iteration, let (3 d t be an approximation for the maximum-likelihood 
estimate (3 c j °f Pd- The stabilized Newton-Raphson algorithm yields a new approximation 

Pd(t+ 1) = Pdt + a dtCdt- 

To define adt and Cdt> let and K* d < 1/2 be given positive constants, let Vi d be the gradient of 
id, let V 2 i d be the Hessian matrix of i d , let 1^ be the t'd-by-i^ identity matrix, let c d t > 0, let 

A dt = -V 2 id{Pdt) + ctdld, 


let 

Cdt = A dt^idt(fidt), 

let I Cdt I — K , an d let a d t > 0 satisfy 


td(Pd(t+i)) ~ td(Pdt) > aotKX'dtWdiPdt)- ( 5 ) 

As in the multivariate normal case, c d t = 0 and a d t = 1 are used if all constraints are satisfied. 
Procedures for finding alternative values are the same, and use of n d = 2 and n* d = 1/16 appears 
acceptable. 
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3.3 Estimated Expected Log Penalty 

As in the multivariate normal case, to evaluate the model, consider the expected log penalty 

Hd{z) = -q~ 1 E(£ di (z)) 

per item. Consider the minimum Id of H d ( z) for r'd-dinrensional vectors z such that Zj > 0 for 
1 < j < q. Let H d ((3 d ) = I- If the 2PL model with a polytomous ability vector is correct, then 
(3 d is defined as in the model definition and I d is the entropy per item of the vector X,;. If j3 d 
is uniquely defined, then (3 d converges to / 3 d with probability 1 as the sample size n goes to oo, 
whether or not the model holds, and I d = ( nq)~ 1 £ d {0 ) converges to I d ■ Let 

Z d = E(-V 2 £di(P d )), 

and let 

Y d = E(vedi(l3 d )[ve di (j3 d )}'). 

Then n 1,/2 (/3 rf — f3 d ) converges in distribution to a normal random vector with mean 0 and 
covariance matrix Zl^Y^ZT 1 . If the model holds, then Y^ = Z d and the covariance matrix is 
Y^ 1 . The scaled difference n l / 2 (I d — I d ) converges in distribution to a normal random variable 
with mean 0 and a variance equal to the variance of q~ 1 £ d i(f3). The expected value of I d is less 
than I d - As n approaches oo, 2 nq[I d — E(I d )\ converges to V’d = t r (Z^ 1 Y^). In addition, if Xo is 
independent of X.; for 1 < i < n and Xo has the same distribution as X,;, then the conditional 
expectation I d o of the log penalty —I d o(/3 d ) given Xj, 1 < i < n, for prediction of Xo satisfies the 
condition that nq(I d o — I d ) converges in distribution to a random variable with expectation 
and is v d if the model holds. More generally, i^ d is estimated by ^) d = tr(Z^ 1 Y ( ^), where 

Z d = —n~ 1 \ /2 £ d (f3 d ) 

and 

n 

Y d = n- l Y J ^Wd)[^Wd)]'- 
1=1 

Thus I d o may be approximated by I d o = I d + il>d/(nq)- If the model is correct, then the simplified 
Akaike approximation I da q = I + u/(nq ) may be employed. 


15 



3.4 Estimated Ability Parameters 

As in the multivariate normal case, the ability parameter 9j can be defined and approximated 
even if the underlying model is not accurate (?, ?). Let Odi be defined as a random vector such 
that the conditional distribution of 9 ( n given Xj = x is the same as the conditional distribution 
of a random vector 6* di given the random vector X* with values in T, where 6* = u, u> in 17, 
with probability pde{^',Pd) and the conditional probability that X* = x in T given 6* di = u is 
Prf(x|cj; (3 d ). Thus the conditional probability pd(u:\x; f3 d ) that 6 di = u> of given Xj = x is given 
by Bayes’s theorem to be 


Pd 0 |xH*; Pd) 


p d (x.\cj; P d )Pdo(u; Pd) 
Pd{*; Pd) 


If the model actually holds, then 0; in the model definition has the same distribution as 9 d i- 
The information per item provided by 9 d i is 


A d = Id~ Ido- 


Alternatively, qA d is the information that Xj provides concerning 9 dl - Here I d o is the expected 
value per item of the log penalty — logpd(X-j\9 d i', P d ) from use of the conditional probability 
approximation pd(X-i\9 d i) for the conditional probability given 9 d i for the observed value of Xj. 
One has 

<? 

Eie = Q 1 ^ Idje, 
j = 1 


where 


IdjG E(logP(Xij,p dj 9dkU) Pd(q+j)))- 


An application of Bayes’s theorem shows that I d e nray be estimated by 

n 

he = -(nq^Y^iPdi^uPd)}^ 1 ^Pd( x i|w;/3 d )k>gp,j(Xj|u>;/3 d ). 

i= 1 

It follows that A d has estimate = I d — I d e- 

The conditional expectation 9 d i = £?(0<jj|Xj) of 9 d i given Xj is found from Bayes’s theorem 
to be £ , 0 |x(Xj;/3 d ), where 

^de|x( x ; Pd) = updeixi^lx', Pd)- 

As in the multivariate normal case, although the expectation E(9 d i) = E(9 d i) of 9 d i is the 
zero vector 0 under the model, the expectation need not be 0 if the model does not hold. The 
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estimated conditional expectation of G ( n given X; is G dt = E d e\x.0^i, f3 d ). One may employ 0 dl as 
an estimate of G d i- The expectation E(G d i) is then estimated by the average G d = n- 1 Er= i 
The covariance matrix of Odi is 

Cov (Gd) = E([G dl - E(G di )][G d i - E(G di )}'). 


The corresponding estimate is 

n 

c w(0 d ) = n" 1 Yfidi - dd\[Odi - e d y. 

i =1 

The conditional covariance matrix Cov^(0|X) of 9 d i given X/ may be used to assess the 
accuracy with which the data X, determine G d i ■ One has Covj(0d|X) = Cov d o\ x(Xj;/3^), where 


Cov d0 | X (x;/3 d ) = ^[uj- ^ 0 |x(x;/3 d )][^ - E de ^(x; /3)]'p<W|xMx; 0 d ). 

The estimate of Covi(G d \X.) is then Cov.;(0d|X) = Cov rf0 | X (Xj; /3 d ). The expected conditional 
covariance matrix iH(Covdi(0d|X)) is then estimated by 


Cov(0 d |X) = n- 1 Covj(0d|X). 

i =1 


For any nonzero r-dimensional vector c, the reliability of c'G d i is 


Pd( c ) 


d Cov(G d )c 

c'E(Covi(G d \X))c + d Cov(G d )c 


As in the multivariate normal case, the reliability of dG d i is approximately the same in large 
samples, and the estimated reliability is then 


Pd( c ) 


dCov(G d )c 

c'Cov(0rf|X)c + c / Cov(0 ( j)c 


4 Application to Praxis Data 

To illustrate results, data from a Praxis examination were examined. The examination is 
a multiple-choice right-scored test of content knowledge for certification for elementary school 
teachers. The test includes 120 items divided into four sections of 30 items apiece. Sections 
measure knowledge of language arts, mathematics, social studies, and science. For the particular 
administration studied, two items were not used in scoring due to unsatisfactory performance, one 
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Table 1 

Estimated Expected Log Penalties per Item for Unidimensional Models 


Model 

Latent variable 

Estimated 
log penalty 

Akaike 

measure 

Gilula-Haberman 

measure 

Independent 


0.54539 

0.54555 

0.54555 

1PL 

Normal 

0.50785 

0.50811 

0.50811 

2PL 

Normal 

0.50115 

0.50147 

0.50148 

3PL 

Normal 

0.50034 

0.50083 


2PL 

Polytomous 

0.50121 

0.50153 

0.50154 


from the section on language arts and one from the section on social studies. As a consequence, 
29 items are used for language arts, 30 for mathematics, 29 for social studies, and 30 for science. 
Analysis included 6,168 examinees. 

Preliminary analysis of the data was based on one scale with 118 items. A summary of 
results can be found in Table 1. In this analysis, the univariate normal ability distribution was 
used with a 1PL, 2PL, and 3PL model to obtain a basic perspective on estimated expected log 
penalties per item. Adaptive quadrature used 9 points. For comparison, a 2PL model was also 
used with nine ability levels. These levels were the integers —4 to 4, and (3) was used for the 
ability distribution. A model that assumed that the Xjj were all independent was also considered 
to establish a further baseline. The Gilula-Haberman measure was omitted for the 3PL case due 
to problems with parameter identification for this model (the Hessian matrix was nearly singular, 
so that the correction was not approximated in a satisfactory manner). 

The preliminary analysis suggests that, relative to the independence model, the normal 1PL 
model represents an improvement of about 6.86% in the Akaike or Gilula-Haberman measures. 
The gain from the normal 2PL model is modest, for the improvement over the normal 1PL model 
is only about 1.30% for these two measures. The gain from the normal 2PL to the normal 3PL is 
very small, only 0.13% for the Akaike measure. The polytomous 2PL case studied is comparable 
to the normal 2PL model, for the loss in terms of the Akaike or Gilula-Haberman criterion is only 
0 . 01 %. 

The choice of 9 points for the polytomous model is not of unusual significance. Use of 7 evenly 
spaced points rather than 9 in the polytomous case defined by (3) only increases the Akaike and 
Gilula-Haberman measures by 0.018%. In the other direction, a polytomous model with 11 evenly 
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spaced points defined by (3) leads to Akaike and Gilula-Haberman measures only 0.004% greater 
than in the normal case. 

Use of (4) rather than (3) had relatively limited impact. Nonetheless, it is interesting to 
note that a model for 9 points that used linear, quadratic, cubic, and quartic components yielded 
essentially the same Gilula-Haberman measure as did the normal 2PL model and an Akaike 
measure that was about 0.002% smaller than for the normal 2PL model. Nonetheless, models not 
based on (3) generally involve somewhat more computational labor than do those based on (3). 

For normal models, the choice of the number of quadrature points had relatively little 
influence. Use of 5 or 7 quadrature points had virtually no effect. Even for 3 quadrature points, 
the Gilula-Haberman and Akaike criteria increased by only about 0.002% relative to those for 
9-point quadrature. Relative to 9-point adaptive quadrature, the extreme case of 2 points only 
increased the Gilula-Haberman and Akaike criteria by about 0.008%. 

Four-dimensional 2PL analysis was then considered for multivariate normal and for 
polytomous cases. Results are provided in Table 2. The normal case reported used 4 points 
for each dimension, so that 256 = 4 4 four-dimensional vectors were involved in the required 
multidimensional quadratures. Essentially the same results can be obtained for other selections 
of numbers of points per dimension such as 6 points for the first dimension and 3 points for the 
remaining three dimensions, so that 162 four-dimensional vectors are required per quadrature. 
Increases in Akaike and Gilula-Haberman measures of about 0.004% are observed with 3 points 
per dimension. 

A variety of multidimensional polytomous models were explored. A base model used 4 evenly 
spaced points for each dimension and all possible combinations of these points with a log linear 
model defined by (3). A second model used 5 evenly spaced points from —2 to 2 with a log linear 
model defined by (3); however, vectors were excluded whenever the difference between any two 
coordinates exceeded 2. Thus (—2, —1, 0, —1) was in H, but (—2, —1,1, —1) was not in H. In all, 
H contained 211 points. The third polytomous model used 6 evenly spaced points from —5 to 5 
and a log linear model defined by (3), but vectors were excluded if any two coordinates differed by 
more than 4, so that H contained 276 points. Thus (—5, —3, —1, —3) was in H but (—5, —3,1, —3) 
was not in H. The last model used 7 evenly spaced points from —3 to 3, and vectors were excluded 
if any two coordinates differed by more than 4. Thus H contained 341 points. 

In this example, some gain is achieved by use of a multidimensional analysis. In the normal 


19 



Table 2 

Estimated Expected Log Penalties per Item for Four-Dimensional 2PL Models 


Latent variable 

Latent classes 
per variable 

Estimated 
log penalty 

Akaike 

measure 

Gilula-Haberman 

measure 

Multivariate normal 


0.49856 

0.49889 

0.49890 

Polytonrous 

4 

0.49947 

0.49981 

0.49982 

Polytonrous 

5 

0.49888 

0.49922 

0.49923 

Polytonrous 

6 

0.49871 

0.49905 

0.49906 

Polytonrous 

7 

0.49861 

0.49895 

0.49895 


case, the four-dimensional model results in a reduction of the Akaike or Gilula-Haberman criterion 
by 0.514% relative to the one-dinrensional model. This percentage change is much smaller than the 
change from a one-dinrensional normal 1PL model to a one-dinrensional normal 2PL model, but 
it is far larger than the change from a one-dinrensional normal 2PL model to a one-dinrensional 
normal 3PL model or from a one-dinrensional polytonrous 2PL model with nine latent classes with 
probabilities satisfying (3) to a one-dinrensional normal 2PL model. 

Differences between the normal case and the polytonrous case are rather modest, although 
some details are worth considering. In all cases in Table 2, the normal case is more successful; 
however, the 7-point model increases the Akaike and Gilula-Habernran criteria by only 0.010 to 
0.012%. Even for the 4-point example, the increase in the two criteria is only 0.184%. Use of more 
general log linear models than the model defined by (3) had little effect. At least for the data 
under study, model choice is likely to depend on the amount of computation regarded as tolerable 
and on considerations related to interpretation of test results. 

Results are also rather similar in terms of estimated information on 6 and in terms of 
reliability coefficients for estimated ability coordinates. Table 3 provides a summary of estimates 
of the information concerning 6 , provided by Xj for the models considered. On the whole, the 
estimates are quite similar, but again the multivariate normal case provides the best result, and 
the polytonrous case with 7 points per dimension has an estimate that is about 1.24% smaller. For 
comparison, note that the estimated information for 9{ for a one-dinrensional normal 2PL model 
is 1.35963, so that the gain in the four-dinrensional case is quite clear. 

The estimated reliability coefficients for the four coordinates of 9i are quite similar for all 2PL 
models. Consider Table 4. The composite listed is the sum of the coordinates. For comparison, 
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Table 3 

Estimated Information on 8[ Provided by X, 


Latent variables 

Latent classes 
per variable 

Estimated 

information 

Multivariate normal 

2.14457 

Polytomous 

4 

1.99651 

Polytomous 

5 

2.07177 

Polytomous 

6 

2.09598 

Polytomous 

7 

2.11788 


Table 4 

Estimated Reliability Coefficients 
for Ability Estimates for Four-Dimensional 2PL Models 


Latent variables 

Latent classes 
per variable 

Language 

arts 

Mathematics 

Social 

studies 

Science 

Composite 

Normal 


0.86892 

0.87644 

0.83767 

0.88038 

0.92495 

Polytomous 

4 

0.86581 

0.88014 

0.83331 

0.87410 

0.93046 

Polytomous 

5 

0.86752 

0.88098 

0.83430 

0.97704 

0.92986 

Polytomous 

6 

0.86807 

0.87820 

0.83541 

0.88017 

0.92903 

Polytomous 

7 

0.86860 

0.87940 

0.83711 

0.87915 

0.92852 


note that the reliability estimate for the one-dinrensional normal case is 0.92620, and the estimated 
Cronbach alpha for the sum of the 118 item scores is 0.92408. The score sums for the individual 
sections have respective estimated Cronbach alpha statistics of 0.77088 for language arts, 0.84378 
for mathematics, 0.71289 for social studies, and 0.77074 for science. The improved estimated 
reliability for estimated conditional means of coordinates of 6 , reflects exploitation of correlations 
between section scores. Results are rather similar, albeit slightly better, than the proportional 
reduction in mean-squared error achieved by use of the observed section score sum and observed 
total test score to predict the true section score (Haberman, 2005b). For these data, the estimated 
proportional reductions are 0.85757 for language arts, 0.87370 for mathematics, 0.81210 for social 
studies, and 0.86742 for science. 

Estimated model parameters for the multivariate normal and polytomous cases are quite 
closely linked, although some care must be taken to treat differences in scaling of variables. This 
issue is especially significant when estimated item discriminations are studied. Conditional on 
the test component, the sample correlation of estimated item discriminations for any pair of 
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Table 5 

Estimated Correlations of Ability Coordinates 


Latent var. 

Classes 

per var. 

LA by M 

LA by SS 

LA by S 

M by SS 

M by S 

SS by S 

Normal 


0.84538 

0.83086 

0.88139 

0.72643 

0.82708 

0.89262 

Polytomous 

4 

0.86155 

0.79247 

0.84659 

0.69808 

0.80617 

0.81064 

Polytomous 

5 

0.82869 

0.81657 

0.86311 

0.71272 

0.81040 

0.87563 

Polytomous 

6 

0.86808 

0.86282 

0.88860 

0.81620 

0.85890 

0.90792 

Polytomous 

7 

0.84035 

0.82818 

0.87485 

0.72482 

0.82361 

0.88729 


Note. LA = language arts, M = mathematics, SS = social studies, S science. 


models is never less than 0.99880 and for the normal and 7-point polytomous cases, the sample 
correlation is at least 0.99994; however, sample means of item discriminations for the different 
models are quite different. The polytomous models with 4 or 6 points per dimension have item 
discriminations roughly half of those in the normal case, while the polytomous models with 5 or 7 
points per dimension have item discriminations somewhat larger than for the other polytomous 
cases but appreciably smaller than in the normal case. In the case of the item intercepts, the 
sample correlations for a specific test are all at least 0.99958, and at least 0.99999 for the normal 
and 7-point polytomous pair. Here effects of scaling are much smaller, especially if the 4-point 
polytomous case is excluded. 

Comparison of estimated distributions of 6i is more complex given the problem of scaling; 
however, it is worth note that estimated correlations of coordinates of Oi are somewhat similar 
for the various models, but they do not agree very precisely. Consider Table 5. The 7-class 
case exhibits particularly good agreement with normal results. The correlations are quite high, 
although mathematics and social studies are less highly correlated than are other pairs of 
disciplines. 


5 Conclusions 

The example suggests that either a multivariate normal or a polytomous ability distribution 
can be used to achieve rather similar results for 2PL models for multidimensional item response 
analysis. Either the stabilized Newton-Raphson methods or the EM algorithm may be employed 
in computations. In this example, the multivariate normal ability distribution generally had 
a slight advantage; however, the difference was remarkably modest. Client preferences could 
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influence any decision concerning which model to use. It is possible that other examples will arise 
in which differences between approaches have more substantial consequences. 

Computational burden for analysis appears acceptable, although many details of calculation 
would best be modified for much larger samples. It would probably be advisable to begin 
calculations with a few hundred or few thousand observations to establish good approximations to 
maximum-likelihood estimates. The approximations would then be used to complete computations 
with the full sample. When computational labor is a major issue, then it is likely that the use of 
adaptive quadrature in the multivariate normal case with only 2 or 3 points per dimension will be 
the most attractive option. 

The use of multidimensional item response models to generate subscores is quite feasible, as 
evident from the example. Given the similarity in results to those based on classical test theory, 
client preferences may again be a significant consideration. 

The example used in the analysis was selected because the skills assessed were not closely 
linked. It should be emphasized that multidimensional item response analysis is not likely to 
reveal anything useful when skills are very tightly linked. Indeed, the estimation of the ability 
distribution will become increasingly challenging as correlations of ability coordinates approach 1. 

The techniques used in this report can be applied quite readily to multidimensional versions 
of generalized partial credit models and to cases in which covariates are present or in which not 
all items are presented to each examinees (Xu & von Davier, 2006); however, these generalizations 
have not yet been fully implemented for all cases considered in this report. 
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